Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($$\mathrm{PDA}$$) index, built upon the Linear Discriminant Analysis ($$\mathrm{LDA}$$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($$\mathrm{SVM}$$). This paper conducts extensive numerical studies to compare the performance of the $$\mathrm{PDA}$$ index with the $$\mathrm{LDA}$$ index and $$\mathrm{SVM}$$, demonstrating that the $$\mathrm{PDA}$$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $$\mathrm{PDA}$$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $$\mathrm{PDA}$$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools.more » « less
-
Summary Finding a suitable representation of multivariate data is fundamental in many scientific disciplines. Projection pursuit ( ) aims to extract interesting ‘non‐Gaussian’ features from multivariate data, and tends to be computationally intensive even when applied to data of low dimension. In high‐dimensional settings, a recent work (Bickel et al., 2018) on addresses asymptotic characterization and conjectures of the feasible projections as the dimension grows with sample size. To gain practical utility of and learn theoretical insights into in an integral way, data analytic tools needed to evaluate the behaviour of in high dimensions become increasingly desirable but are less explored in the literature. This paper focuses on developing computationally fast and effective approaches central to finite sample studies for (i) visualizing the feasibility of in extracting features from high‐dimensional data, as compared with alternative methods like and , and (ii) assessing the plausibility of in cases where asymptotic studies are lacking or unavailable, with the goal of better understanding the practicality, limitation and challenge of in the analysis of large data sets.more » « less
An official website of the United States government
